Assignment probabilities based on segment sizes #6

shaycrk · 2014-02-21T22:19:53Z

We've been using this tool to create segments for email campaigns, but have run into an issue when using it to generate segments of much different sizes.

The current code is looping over parent campaign members in the order they come out of the underlying datastore and attempting to assign to each of the groups with equal probability. In practice, this means small groups fill up first (presumably with older contacts who have lower primary keys) and larger groups end up over-representing contacts who are assigned later (presumably newer contacts with larger primary keys). Additionally, this has a performance impact by continuing to attempt to assign members to the small segments even after they've filled up, resulting in many recursive calls to assignMember().

This patch assigns a probability to each segment based on its relative size and then makes assignments based on those probabilities. As a result, members are added to smaller segments at a slower rate than to the larger segments, providing a more even distribution of assignments relative to the initial ordering of the contacts and fewer recursive calls.

Patch to assign members to each list with a probability based on the relative size of each list to generate. This fixes a bug in which members were assigned to lists with equal probability (until each list hit is specified size), causing small lists to get filled up with older members (lower primary keys) and larger lists to over-represent newer members.

Added missing semicolons.

Save one loop through the sizes array by generating the CDF directly, rather than creating a PDF first.

bug fixes

Comment out system.debug() statement

The earlier commit won't cover batched loading for large lists, so updating that file with the same size-based population.

Generate the CDF when setting up the batch loader for large segments.

shaycrk added 7 commits February 21, 2014 12:24

missing semicolons

b5c2cee

Added missing semicolons.

Generate CDF directly

8e03bb6

Save one loop through the sizes array by generating the CDF directly, rather than creating a PDF first.

debug

731fbc4

bug fixes

comment debug statement

8f0c0cc

Comment out system.debug() statement

New methodology for batch segments as well

b994a89

The earlier commit won't cover batched loading for large lists, so updating that file with the same size-based population.

Figure out probabilities for batch load

b6bb467

Generate the CDF when setting up the batch loader for large segments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assignment probabilities based on segment sizes #6

Assignment probabilities based on segment sizes #6

shaycrk commented Feb 21, 2014

Assignment probabilities based on segment sizes #6

Are you sure you want to change the base?

Assignment probabilities based on segment sizes #6

Conversation

shaycrk commented Feb 21, 2014